13  Week 5: Analysis of Variance (ANOVA)

Introduction to Statistics for Animal Science

Author

AnS 500 - Fall 2025

Published

November 14, 2025

14 Introduction

You’re a cattle nutritionist comparing four different feed formulations for finishing steers. After a 90-day trial with 15 steers per feed type, you’ve measured average daily gain (ADG) for each animal. Now you need to determine: Do these feed formulations produce different growth rates?

Your first instinct might be to run t-tests comparing all pairs of feeds: - Feed A vs Feed B - Feed A vs Feed C - Feed A vs Feed D - Feed B vs Feed C - Feed B vs Feed D - Feed C vs Feed D

That’s 6 t-tests for just 4 groups! But this approach has a critical problem: each test carries a 5% false positive rate, and these errors accumulate. By the time you’ve run 6 tests, your chance of finding at least one “significant” result by pure chance has jumped to about 26%.

This is where Analysis of Variance (ANOVA) comes in. ANOVA allows us to test for differences among multiple groups with a single test, maintaining control over our Type I error rate.

Key Questions We’ll Address:

  • Why does running multiple t-tests inflate our error rate?
  • How does ANOVA partition variance to test group differences?
  • What’s the connection between ANOVA and linear regression?
  • How do we check ANOVA assumptions?
  • When ANOVA is significant, which specific groups differ?
  • What are Type I, II, and III sums of squares, and why do they matter?

By the end of this lecture, you’ll be able to conduct one-way ANOVA using modern R approaches (lm() and car::Anova()), check assumptions, perform post-hoc tests, and calculate effect sizes.

NoteBuilding on Previous Weeks

We’ve already covered:

  • Week 1: P-values and the logic of hypothesis testing
  • Week 2: Descriptive statistics and visualizing group differences
  • Week 3: Sampling distributions and the Central Limit Theorem
  • Week 4: t-tests for comparing one or two groups

This week, we extend t-tests to handle three or more groups and preview the connection to regression (Weeks 7-8).

15 The Multiple Comparisons Problem

15.1 Why Not Just Run Multiple t-Tests?

Let’s simulate what happens when we run multiple pairwise t-tests on groups that actually have no real differences:

Code
# Simulate 5 groups with NO true differences (all μ = 100, σ = 15)
set.seed(123)
n_per_group <- 20
n_groups <- 5

# Create data where H0 is TRUE (all groups have same mean)
groups_data <- tibble(
  group = rep(LETTERS[1:n_groups], each = n_per_group),
  value = rnorm(n_per_group * n_groups, mean = 100, sd = 15)
)

# Visualize
ggplot(groups_data, aes(x = group, y = value, fill = group)) +
  geom_boxplot(alpha = 0.6, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.5, size = 2) +
  geom_hline(yintercept = 100, linetype = "dashed", color = "red", linewidth = 1) +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Five Groups with NO True Differences",
       subtitle = "All groups sampled from same distribution (μ = 100, σ = 15)",
       x = "Group",
       y = "Value") +
  theme(legend.position = "none")

These groups look slightly different due to random sampling, but they were all drawn from the same population. Now let’s run all possible pairwise t-tests:

Code
# All pairwise comparisons
group_pairs <- combn(LETTERS[1:n_groups], 2, simplify = FALSE)

# Run t-tests
pairwise_results <- map_dfr(group_pairs, function(pair) {
  group1_data <- groups_data %>% filter(group == pair[1]) %>% pull(value)
  group2_data <- groups_data %>% filter(group == pair[2]) %>% pull(value)

  test <- t.test(group1_data, group2_data)

  tibble(
    comparison = paste(pair[1], "vs", pair[2]),
    t_stat = test$statistic,
    p_value = test$p.value,
    significant = p_value < 0.05
  )
})

knitr::kable(pairwise_results, digits = 4, align = 'lccc',
             caption = "Pairwise t-Tests for Five Groups (H₀ is TRUE)")
Pairwise t-Tests for Five Groups (H₀ is TRUE)
comparison t_stat p_value significant
A vs B 0.6746 0.5041 FALSE
A vs C 0.1151 0.9089 FALSE
A vs D 0.8501 0.4006 FALSE
A vs E -0.8170 0.4192 FALSE
B vs C -0.5568 0.5810 FALSE
B vs D 0.2401 0.8116 FALSE
B vs E -1.6254 0.1123 FALSE
C vs D 0.7417 0.4628 FALSE
C vs E -0.9486 0.3490 FALSE
D vs E -1.7317 0.0916 FALSE
Code
# How many false positives?
n_comparisons <- nrow(pairwise_results)
n_false_positives <- sum(pairwise_results$significant)

cat(sprintf("\nNumber of comparisons: %d\n", n_comparisons))

Number of comparisons: 10
Code
cat(sprintf("Number of 'significant' results: %d (%.1f%%)\n",
            n_false_positives, 100 * n_false_positives / n_comparisons))
Number of 'significant' results: 0 (0.0%)

15.2 Family-Wise Error Rate

When we run multiple tests, we need to distinguish between:

  • Per-comparison error rate: The α level for a single test (typically 0.05)
  • Family-wise error rate (FWER): The probability of making at least one Type I error across all tests

The probability of making no Type I errors in k independent tests is:

\[P(\text{no Type I errors}) = (1 - \alpha)^k\]

Therefore, the probability of making at least one Type I error is:

\[\text{FWER} = 1 - (1 - \alpha)^k\]

Let’s calculate this for different numbers of comparisons:

Code
# Calculate FWER for different numbers of groups
n_groups_range <- 2:10
comparisons <- choose(n_groups_range, 2)  # Number of pairwise comparisons
alpha <- 0.05

fwer_data <- tibble(
  n_groups = n_groups_range,
  n_comparisons = comparisons,
  fwer = 1 - (1 - alpha)^comparisons,
  fwer_percent = fwer * 100
)

knitr::kable(fwer_data, digits = 1, align = 'cccc',
             col.names = c("Groups", "Comparisons", "FWER", "FWER (%)"),
             caption = "Family-Wise Error Rate Growth with Multiple Comparisons")
Family-Wise Error Rate Growth with Multiple Comparisons
Groups Comparisons FWER FWER (%)
2 1 0.1 5.0
3 3 0.1 14.3
4 6 0.3 26.5
5 10 0.4 40.1
6 15 0.5 53.7
7 21 0.7 65.9
8 28 0.8 76.2
9 36 0.8 84.2
10 45 0.9 90.1
Code
# Visualize
ggplot(fwer_data, aes(x = n_groups, y = fwer_percent)) +
  geom_line(linewidth = 1.2, color = "darkred") +
  geom_point(size = 3, color = "darkred") +
  geom_hline(yintercept = 5, linetype = "dashed", color = "steelblue") +
  annotate("text", x = 8, y = 7, label = "Per-test α = 5%", color = "steelblue") +
  scale_x_continuous(breaks = 2:10) +
  labs(title = "Family-Wise Error Rate Explodes with Multiple Groups",
       subtitle = "Probability of at least one false positive when all null hypotheses are true",
       x = "Number of Groups",
       y = "Family-Wise Error Rate (%)") +
  theme_minimal(base_size = 12)

Key insight: With just 5 groups (10 comparisons), the FWER jumps to 40%! This means we have a 40% chance of declaring at least one comparison “significant” even when no true differences exist.

ImportantANOVA: The Solution

ANOVA provides a single omnibus test that asks: “Is there any difference among these groups?”

By conducting one test at α = 0.05, we maintain a 5% false positive rate regardless of how many groups we’re comparing.

If ANOVA is significant, then we conduct post-hoc tests with appropriate adjustments to identify which specific groups differ.

15.3 When to Use ANOVA vs t-Tests

Use t-tests when: - Comparing exactly 2 groups - One-sample or paired designs

Use ANOVA when: - Comparing 3 or more groups - Testing for any difference among groups - Want to control family-wise error rate

16 One-Way ANOVA: Theory and Intuition

16.1 The Core Idea: Partitioning Variance

ANOVA works by partitioning total variance into two components:

  1. Between-group variance: Variability of group means around the overall mean
  2. Within-group variance: Variability of individual observations around their group means

If the between-group variance is much larger than the within-group variance, this suggests the groups differ.

16.1.1 Visual Intuition

Let’s create two scenarios to build intuition:

Code
# Scenario 1: Large between-group differences (H0 is FALSE)
scenario1 <- tibble(
  group = rep(c("A", "B", "C"), each = 20),
  value = c(rnorm(20, 80, 10), rnorm(20, 100, 10), rnorm(20, 120, 10))
)

# Scenario 2: Small between-group differences (H0 is TRUE)
scenario2 <- tibble(
  group = rep(c("A", "B", "C"), each = 20),
  value = c(rnorm(20, 98, 15), rnorm(20, 100, 15), rnorm(20, 102, 15))
)

# Calculate group means
scenario1_means <- scenario1 %>%
  group_by(group) %>%
  summarise(mean_val = mean(value), .groups = 'drop')

scenario2_means <- scenario2 %>%
  group_by(group) %>%
  summarise(mean_val = mean(value), .groups = 'drop')

# Overall means
overall_mean1 <- mean(scenario1$value)
overall_mean2 <- mean(scenario2$value)

# Plot Scenario 1
p1 <- ggplot(scenario1, aes(x = group, y = value, color = group)) +
  geom_jitter(width = 0.2, alpha = 0.5, size = 2) +
  geom_hline(yintercept = overall_mean1, linetype = "dashed",
             color = "black", linewidth = 1) +
  stat_summary(fun = mean, geom = "point", size = 5, shape = 18, color = "darkred") +
  stat_summary(fun = mean, geom = "point", size = 4, shape = 18,
               aes(color = group)) +
  scale_color_brewer(palette = "Set1") +
  annotate("text", x = 3.3, y = overall_mean1, label = "Overall\nmean",
           size = 3, hjust = 0) +
  labs(title = "Scenario 1: Large Between-Group Variance",
       subtitle = "Group means far from overall mean → F will be large",
       x = "Group", y = "Value") +
  theme(legend.position = "none")

# Plot Scenario 2
p2 <- ggplot(scenario2, aes(x = group, y = value, color = group)) +
  geom_jitter(width = 0.2, alpha = 0.5, size = 2) +
  geom_hline(yintercept = overall_mean2, linetype = "dashed",
             color = "black", linewidth = 1) +
  stat_summary(fun = mean, geom = "point", size = 5, shape = 18, color = "darkred") +
  stat_summary(fun = mean, geom = "point", size = 4, shape = 18,
               aes(color = group)) +
  scale_color_brewer(palette = "Set1") +
  annotate("text", x = 3.3, y = overall_mean2, label = "Overall\nmean",
           size = 3, hjust = 0) +
  labs(title = "Scenario 2: Small Between-Group Variance",
       subtitle = "Group means close to overall mean → F will be small",
       x = "Group", y = "Value") +
  theme(legend.position = "none")

p1 + p2

Key observation: - Scenario 1: Group means (colored diamonds) are far from overall mean (black line) → large between-group variance - Scenario 2: Group means are close to overall mean → small between-group variance

16.2 The F-Statistic

The F-statistic is the ratio of between-group variance to within-group variance:

\[F = \frac{\text{Between-group variance (MS}_{\text{between}}\text{)}}{\text{Within-group variance (MS}_{\text{within}}\text{)}}\]

Where: - MS = Mean Square (variance estimate) - A large F-ratio suggests group means differ more than expected by chance - F follows an F-distribution with df₁ (between) and df₂ (within)

NoteConnection to t-Test

When comparing exactly two groups, ANOVA is mathematically equivalent to a t-test:

\[F = t^2\]

ANOVA with 2 groups will give the same p-value as a two-sample t-test!

16.3 ANOVA Table Components

The ANOVA table summarizes the variance partitioning:

Source Sum of Squares (SS) df Mean Square (MS) F p-value
Between SS_between k - 1 SS_between / df_between MS_between / MS_within P(F > F_obs)
Within SS_within N - k SS_within / df_within
Total SS_total N - 1

Where: - k = number of groups - N = total sample size - SS_between: Sum of squared differences between group means and overall mean - SS_within: Sum of squared differences between observations and their group means - df = degrees of freedom

Let’s calculate ANOVA for our two scenarios:

Code
# Fit models
model1 <- lm(value ~ group, data = scenario1)
model2 <- lm(value ~ group, data = scenario2)

# ANOVA tables using car::Anova (Type II)
cat("Scenario 1: Large Between-Group Variance\n")
Scenario 1: Large Between-Group Variance
Code
cat("==========================================\n")
==========================================
Code
Anova(model1, type = "II")
Anova Table (Type II tests)

Response: value
          Sum Sq Df F value    Pr(>F)    
group      17393  2  93.071 < 2.2e-16 ***
Residuals   5326 57                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
cat("\n\nScenario 2: Small Between-Group Variance\n")


Scenario 2: Small Between-Group Variance
Code
cat("==========================================\n")
==========================================
Code
Anova(model2, type = "II")
Anova Table (Type II tests)

Response: value
           Sum Sq Df F value Pr(>F)
group        40.4  2  0.0881 0.9158
Residuals 13081.2 57               

Interpretation: - Scenario 1: F = 93.1, p < 0.001 → Strong evidence of group differences - Scenario 2: F = 0.09, p = 0.916 → No evidence of group differences

17 ANOVA as a Linear Model

17.1 Why Use lm() Instead of aov()?

In R, you can fit ANOVA using either aov() or lm(). We’ll use lm() because:

  1. More flexible: Same framework works for t-tests, ANOVA, and regression
  2. Prepares for regression: Weeks 7-8 will use lm() for continuous predictors
  3. Modern approach: Works seamlessly with car::Anova(), emmeans, and other packages
  4. Better diagnostics: Standard regression diagnostic plots apply
TipANOVA IS Regression

One-way ANOVA is actually a special case of linear regression where the predictor is categorical. R internally converts group labels to dummy variables.

This connection will become clearer in Weeks 7-8!

17.2 Fitting ANOVA with lm()

Let’s work through a complete example using the cattle feed trial scenario:

Code
# Simulate feed trial data
# 4 feed types, 15 steers per feed, 90-day trial measuring average daily gain (kg)
set.seed(456)

feed_data <- tibble(
  steer_id = 1:60,
  feed_type = factor(rep(c("A", "B", "C", "D"), each = 15)),
  # Simulated ADG with true differences:
  # A: 1.45, B: 1.55, C: 1.50, D: 1.70 kg/day
  adg_kg = c(
    rnorm(15, mean = 1.45, sd = 0.18),  # Feed A
    rnorm(15, mean = 1.55, sd = 0.18),  # Feed B
    rnorm(15, mean = 1.50, sd = 0.18),  # Feed C
    rnorm(15, mean = 1.70, sd = 0.18)   # Feed D
  )
)

# Summary statistics
feed_summary <- feed_data %>%
  group_by(feed_type) %>%
  summarise(
    n = n(),
    mean_adg = mean(adg_kg),
    sd_adg = sd(adg_kg),
    se_adg = sd_adg / sqrt(n),
    .groups = 'drop'
  )

knitr::kable(feed_summary, digits = 3, align = 'lcccc',
             col.names = c("Feed", "n", "Mean ADG", "SD", "SE"),
             caption = "Summary Statistics: Average Daily Gain by Feed Type")
Summary Statistics: Average Daily Gain by Feed Type
Feed n Mean ADG SD SE
A 15 1.471 0.189 0.049
B 15 1.612 0.231 0.060
C 15 1.502 0.175 0.045
D 15 1.754 0.125 0.032

17.2.1 Step 1: Visualize the Data

Always visualize before testing!

Code
# Create comprehensive visualization
p1 <- ggplot(feed_data, aes(x = feed_type, y = adg_kg, fill = feed_type)) +
  geom_boxplot(alpha = 0.6, outlier.shape = NA) +
  geom_jitter(width = 0.15, alpha = 0.5, size = 2) +
  stat_summary(fun = mean, geom = "point", shape = 23, size = 3,
               fill = "darkred", color = "darkred") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Average Daily Gain by Feed Type",
       subtitle = "Diamonds show group means",
       x = "Feed Type",
       y = "Average Daily Gain (kg/day)") +
  theme(legend.position = "none")

p2 <- ggplot(feed_summary, aes(x = feed_type, y = mean_adg, fill = feed_type)) +
  geom_col(alpha = 0.7, width = 0.6) +
  geom_errorbar(aes(ymin = mean_adg - se_adg * 1.96,
                    ymax = mean_adg + se_adg * 1.96),
                width = 0.2, linewidth = 1) +
  geom_text(aes(label = sprintf("%.2f", mean_adg)),
            vjust = -3, fontface = "bold", size = 3.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Mean ADG with 95% Confidence Intervals",
       x = "Feed Type",
       y = "Mean ADG (kg/day)") +
  ylim(0, 2) +
  theme(legend.position = "none")

p1 + p2

Visual assessment: Feed D appears to produce the highest gain, while Feed A appears lowest. But are these differences statistically significant?

17.2.2 Step 2: Fit the Linear Model

Code
# Fit ANOVA as a linear model
feed_model <- lm(adg_kg ~ feed_type, data = feed_data)

# Model summary
summary(feed_model)

Call:
lm(formula = adg_kg ~ feed_type, data = feed_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.37129 -0.12107  0.01025  0.10413  0.37211 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.47125    0.04744  31.016  < 2e-16 ***
feed_typeB   0.14093    0.06708   2.101   0.0402 *  
feed_typeC   0.03084    0.06708   0.460   0.6475    
feed_typeD   0.28287    0.06708   4.217 9.14e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1837 on 56 degrees of freedom
Multiple R-squared:  0.2806,    Adjusted R-squared:  0.2421 
F-statistic: 7.282 on 3 and 56 DF,  p-value: 0.0003302

Key points from summary: - R-squared: Proportion of variance explained by feed type - Coefficients show differences from reference group (Feed A) - But we want the ANOVA table…

17.2.3 Step 3: ANOVA Table with car::Anova()

Code
# ANOVA table using car package (Type II SS)
library(car)
feed_anova <- Anova(feed_model, type = "II")

print(feed_anova)
Anova Table (Type II tests)

Response: adg_kg
           Sum Sq Df F value    Pr(>F)    
feed_type 0.73731  3  7.2816 0.0003302 ***
Residuals 1.89012 56                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Extract F and p-value
f_stat <- feed_anova$`F value`[1]
p_val <- feed_anova$`Pr(>F)`[1]

cat(sprintf("\nANOVA Results: F(%d, %d) = %.2f, p %s\n",
            feed_anova$Df[1], feed_anova$Df[2], f_stat,
            ifelse(p_val < 0.001, "< 0.001", sprintf("= %.4f", p_val))))

ANOVA Results: F(3, 56) = 7.28, p < 0.001
ImportantWhat This ANOVA Tells Us

A significant F-test (p < 0.05) tells us:

There IS evidence that at least one feed type differs from the others

It does NOT tell us which specific feeds differ

To identify specific differences, we need post-hoc tests (covered in Section 7).

18 Type I, II, and III Sums of Squares

18.1 Why This Matters

When you have balanced designs (equal sample sizes), all methods give the same results. But with unbalanced designs (common in real research!), the choice matters.

Key differences:

  • Type I (Sequential): Tests each term after previous terms in the formula (order-dependent)
  • Type II (Marginal): Tests each term after all others (recommended for one-way ANOVA)
  • Type III (Orthogonal): Tests each term adjusted for all others (common in SAS, SPSS)
NoteRecommendation for One-Way ANOVA

For one-way ANOVA (single factor), Type II and Type III give identical results.

Use car::Anova(model, type = "II") as your default.

Type II/III differences become important with multiple factors or interactions (beyond this course).

18.2 Demonstrating the Difference

Let’s create an unbalanced version of our feed trial and compare methods:

Code
# Create unbalanced design
set.seed(789)
feed_data_unbal <- tibble(
  feed_type = c(rep("A", 12), rep("B", 15), rep("C", 18), rep("D", 15)),
  adg_kg = c(
    rnorm(12, mean = 1.45, sd = 0.18),
    rnorm(15, mean = 1.55, sd = 0.18),
    rnorm(18, mean = 1.50, sd = 0.18),
    rnorm(15, mean = 1.70, sd = 0.18)
  )
) %>%
  mutate(feed_type = factor(feed_type))

# Sample sizes
feed_data_unbal %>%
  count(feed_type) %>%
  knitr::kable(col.names = c("Feed Type", "n"),
               caption = "Unbalanced Design Sample Sizes")
Unbalanced Design Sample Sizes
Feed Type n
A 12
B 15
C 18
D 15
Code
# Fit model
model_unbal <- lm(adg_kg ~ feed_type, data = feed_data_unbal)

# Compare Type I, II, and III
cat("Type I (Sequential) Sums of Squares:\n")
Type I (Sequential) Sums of Squares:
Code
cat("=====================================\n")
=====================================
Code
print(anova(model_unbal))  # Base R uses Type I
Analysis of Variance Table

Response: adg_kg
          Df  Sum Sq  Mean Sq F value    Pr(>F)    
feed_type  3 0.80928 0.269760  9.8032 2.698e-05 ***
Residuals 56 1.54098 0.027517                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
cat("\n\nType II Sums of Squares:\n")


Type II Sums of Squares:
Code
cat("========================\n")
========================
Code
print(Anova(model_unbal, type = "II"))
Anova Table (Type II tests)

Response: adg_kg
           Sum Sq Df F value    Pr(>F)    
feed_type 0.80928  3  9.8032 2.698e-05 ***
Residuals 1.54098 56                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
cat("\n\nType III Sums of Squares:\n")


Type III Sums of Squares:
Code
cat("=========================\n")
=========================
Code
print(Anova(model_unbal, type = "III"))
Anova Table (Type III tests)

Response: adg_kg
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 22.7191  1 825.6263 < 2.2e-16 ***
feed_type    0.8093  3   9.8032 2.698e-05 ***
Residuals    1.5410 56                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For one-way ANOVA, Type II and Type III are identical. Type I may differ slightly with unbalanced designs.

Bottom line: For this course (one-way ANOVA), always use:

car::Anova(model, type = "II")

19 ANOVA Assumptions and Diagnostics

ANOVA makes three key assumptions:

  1. Independence: Observations are independent (addressed by study design)
  2. Normality: Residuals are normally distributed
  3. Homogeneity of variance: Equal variance across groups

19.1 Important: Test Residuals, Not Raw Data!

WarningCommon Mistake

Don’t test normality of raw data within each group. Instead, test normality of residuals from the model.

Why? ANOVA assumes that deviations from group means are normally distributed, not that each group is normal independently.

19.2 Diagnostic Plots

The lm() object provides four standard diagnostic plots:

Code
# Create diagnostic plots
par(mfrow = c(2, 2))
plot(feed_model, which = 1:4)

Code
par(mfrow = c(1, 1))

Interpreting diagnostic plots:

  1. Residuals vs Fitted: Should show random scatter (no pattern)
    • Pattern suggests non-constant variance or non-linearity
  2. Normal Q-Q: Points should follow the diagonal line
    • Deviations suggest non-normality
  3. Scale-Location: Should show random scatter
    • Tests homogeneity of variance (constant spread)
  4. Residuals vs Leverage: Identifies influential points
    • Points outside Cook’s distance contours are influential

19.3 Assumption Checks

19.3.1 1. Independence

Cannot be tested statistically - depends on study design.

Violations occur when: - Repeated measures on same subjects (use mixed models) - Clustered data (animals in same pen, fields, etc.) - Time series (autocorrelation)

Solution: Design study properly or use appropriate models (mixed models, GEE).

19.3.2 2. Normality of Residuals

Visual check: Q-Q plot

Code
# Extract residuals
feed_residuals <- residuals(feed_model)

# Q-Q plot
ggplot(tibble(residuals = feed_residuals), aes(sample = residuals)) +
  stat_qq(color = "steelblue", size = 2) +
  stat_qq_line(color = "red", linetype = "dashed", linewidth = 1) +
  labs(title = "Normal Q-Q Plot of Residuals",
       subtitle = "Points should fall close to the line",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  theme_minimal(base_size = 12)

Formal test: Shapiro-Wilk

Code
shapiro_result <- shapiro.test(feed_residuals)

cat(sprintf("Shapiro-Wilk Test for Normality\n"))
Shapiro-Wilk Test for Normality
Code
cat(sprintf("W = %.4f, p-value = %.4f\n",
            shapiro_result$statistic, shapiro_result$p.value))
W = 0.9869, p-value = 0.7660
Code
if(shapiro_result$p.value > 0.05) {
  cat("→ No evidence against normality (p > 0.05)\n")
} else {
  cat("→ Some evidence against normality (p < 0.05)\n")
  cat("  Consider: transformation, non-parametric test, or rely on robustness\n")
}
→ No evidence against normality (p > 0.05)
NoteANOVA is Robust to Normality Violations

With moderate to large sample sizes and no extreme skewness, ANOVA is fairly robust to violations of normality, especially with balanced designs.

If violated: - Try transformations (log, square root, Box-Cox) - Use Kruskal-Wallis test (non-parametric alternative) - Bootstrap confidence intervals

19.3.3 3. Homogeneity of Variance

Visual check: Boxplots

Code
ggplot(feed_data, aes(x = feed_type, y = adg_kg)) +
  geom_boxplot(aes(fill = feed_type), alpha = 0.5) +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1),
               geom = "errorbar", width = 0.3, linewidth = 1, color = "darkred") +
  stat_summary(fun = mean, geom = "point", size = 3, color = "darkred") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Checking Homogeneity of Variance",
       subtitle = "Red bars show mean ± SD; similar heights suggest equal variances",
       x = "Feed Type",
       y = "Average Daily Gain (kg/day)") +
  theme(legend.position = "none")

Formal test: Levene’s Test

Code
levene_result <- leveneTest(feed_model)

print(levene_result)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3   2.119 0.1081
      56               
Code
cat(sprintf("\nLevene's Test: F(%d, %d) = %.3f, p = %.4f\n",
            levene_result$Df[1], levene_result$Df[2],
            levene_result$`F value`[1], levene_result$`Pr(>F)`[1]))

Levene's Test: F(3, 56) = 2.119, p = 0.1081
Code
if(levene_result$`Pr(>F)`[1] > 0.05) {
  cat("→ No evidence of unequal variances (p > 0.05)\n")
} else {
  cat("→ Evidence of unequal variances (p < 0.05)\n")
  cat("  Consider: Welch's ANOVA (oneway.test) or transformation\n")
}
→ No evidence of unequal variances (p > 0.05)

Rule of thumb: If the ratio of largest to smallest SD is < 2, you’re usually fine.

Code
sd_ratio <- max(feed_summary$sd_adg) / min(feed_summary$sd_adg)
cat(sprintf("Ratio of largest to smallest SD: %.2f\n", sd_ratio))
Ratio of largest to smallest SD: 1.85
Code
if(sd_ratio < 2) {
  cat("→ Ratio < 2, variances similar enough for standard ANOVA\n")
} else {
  cat("→ Ratio ≥ 2, consider Welch's ANOVA\n")
}
→ Ratio < 2, variances similar enough for standard ANOVA

19.4 What to Do If Assumptions Are Violated

19.4.1 Unequal Variances → Welch’s ANOVA

Code
# Welch's one-way test (doesn't assume equal variances)
welch_result <- oneway.test(adg_kg ~ feed_type, data = feed_data)

print(welch_result)

    One-way analysis of means (not assuming equal variances)

data:  adg_kg and feed_type
F = 10.682, num df = 3.00, denom df = 30.35, p-value = 5.907e-05
Code
cat(sprintf("\nWelch's ANOVA: F(%.2f, %.2f) = %.3f, p = %.4f\n",
            welch_result$parameter[1], welch_result$parameter[2],
            welch_result$statistic, welch_result$p.value))

Welch's ANOVA: F(3.00, 30.35) = 10.682, p = 0.0001

19.4.2 Non-Normality → Kruskal-Wallis Test

Code
# Kruskal-Wallis (non-parametric alternative to ANOVA)
kruskal_result <- kruskal.test(adg_kg ~ feed_type, data = feed_data)

print(kruskal_result)

    Kruskal-Wallis rank sum test

data:  adg_kg by feed_type
Kruskal-Wallis chi-squared = 15.964, df = 3, p-value = 0.001153
Code
cat(sprintf("\nKruskal-Wallis: χ²(%d) = %.3f, p = %.4f\n",
            kruskal_result$parameter,
            kruskal_result$statistic,
            kruskal_result$p.value))

Kruskal-Wallis: χ²(3) = 15.964, p = 0.0012

Note: Kruskal-Wallis tests for differences in distributions, not just means. If significant, use Dunn’s test for post-hoc comparisons.

20 Post-Hoc Tests: Which Groups Differ?

A significant ANOVA tells us “at least one group differs” but not which pairs of groups differ. Post-hoc tests address this.

20.1 The Multiple Comparisons Problem Returns

If ANOVA is significant, you might be tempted to run all pairwise t-tests. But this brings back the multiple comparisons problem!

Solution: Post-hoc tests that adjust p-values to control family-wise error rate.

20.2 Common Post-Hoc Methods

Method Use When Control Level
Tukey HSD All pairwise comparisons Balanced or unbalanced
Bonferroni Small number of comparisons Conservative, any design
Dunnett’s Comparing to a control group Specific control comparison
Scheffe Complex contrasts Most conservative
No adjustment DON’T DO THIS Inflates Type I error
TipRecommended: Tukey HSD

Tukey’s Honestly Significant Difference is the most commonly used post-hoc test. It: - Controls family-wise error rate at α - Works well with balanced and unbalanced designs - Tests all pairwise comparisons

20.3 Post-Hoc Tests Using emmeans

The emmeans (estimated marginal means) package provides a modern, flexible framework for post-hoc tests:

Code
library(emmeans)

# Step 1: Compute estimated marginal means
feed_emm <- emmeans(feed_model, ~ feed_type)

print(feed_emm)
 feed_type emmean     SE df lower.CL upper.CL
 A           1.47 0.0474 56     1.38     1.57
 B           1.61 0.0474 56     1.52     1.71
 C           1.50 0.0474 56     1.41     1.60
 D           1.75 0.0474 56     1.66     1.85

Confidence level used: 0.95 
Code
# Step 2: Pairwise comparisons with Tukey adjustment
feed_pairs_tukey <- pairs(feed_emm, adjust = "tukey")

print(feed_pairs_tukey)
 contrast estimate     SE df t.ratio p.value
 A - B     -0.1409 0.0671 56  -2.101  0.1654
 A - C     -0.0308 0.0671 56  -0.460  0.9674
 A - D     -0.2829 0.0671 56  -4.217  0.0005
 B - C      0.1101 0.0671 56   1.641  0.3644
 B - D     -0.1419 0.0671 56  -2.116  0.1606
 C - D     -0.2520 0.0671 56  -3.757  0.0023

P value adjustment: tukey method for comparing a family of 4 estimates 
Code
# Clean summary
feed_pairs_tukey_df <- as.data.frame(feed_pairs_tukey)

knitr::kable(feed_pairs_tukey_df, digits = 4, align = 'lcccc',
             caption = "Tukey HSD Post-Hoc Comparisons")
Tukey HSD Post-Hoc Comparisons
contrast estimate SE df t.ratio p.value
A - B -0.1409 0.0671 56 -2.1008 0.1654
A - C -0.0308 0.0671 56 -0.4597 0.9674
A - D -0.2829 0.0671 56 -4.2167 0.0005
B - C 0.1101 0.0671 56 1.6411 0.3644
B - D -0.1419 0.0671 56 -2.1159 0.1606
C - D -0.2520 0.0671 56 -3.7569 0.0023

Interpretation: - estimate: Difference in means (Feed 1 - Feed 2) - SE: Standard error of the difference - t.ratio: Test statistic - p.value: Adjusted p-value (controls FWER)

Which comparisons are significant (p < 0.05)?

Code
sig_pairs <- feed_pairs_tukey_df %>%
  filter(p.value < 0.05) %>%
  dplyr::select(contrast, estimate, p.value)

if(nrow(sig_pairs) > 0) {
  cat("Significant pairwise differences:\n")
  knitr::kable(sig_pairs, digits = 4,
               caption = "Significant Pairs (Tukey-adjusted p < 0.05)")
} else {
  cat("No significant pairwise differences after Tukey adjustment.\n")
}
Significant pairwise differences:
Significant Pairs (Tukey-adjusted p < 0.05)
contrast estimate p.value
A - D -0.2829 0.0005
C - D -0.2520 0.0023

20.4 Comparing Adjustment Methods

Let’s compare Tukey vs Bonferroni vs no adjustment:

Code
# Different adjustments
pairs_none <- pairs(feed_emm, adjust = "none")
pairs_bonf <- pairs(feed_emm, adjust = "bonferroni")
pairs_tukey <- pairs(feed_emm, adjust = "tukey")

# Extract p-values
comparison_df <- tibble(
  Comparison = as.character(feed_pairs_tukey_df$contrast),
  `No Adjust` = summary(pairs_none)$p.value,
  Bonferroni = summary(pairs_bonf)$p.value,
  Tukey = summary(pairs_tukey)$p.value
)

knitr::kable(comparison_df, digits = 4, align = 'lccc',
             caption = "P-values Under Different Adjustment Methods")
P-values Under Different Adjustment Methods
Comparison No Adjust Bonferroni Tukey
A - B 0.0402 0.2410 0.1654
A - C 0.6475 1.0000 0.9674
A - D 0.0001 0.0005 0.0005
B - C 0.1064 0.6383 0.3644
B - D 0.0388 0.2329 0.1606
C - D 0.0004 0.0025 0.0023

Observation: - No adjustment gives smallest p-values (most “discoveries”) but inflates Type I error - Bonferroni is most conservative (largest p-values) - Tukey is in between and recommended for pairwise comparisons

20.5 Visualizing Post-Hoc Results

20.5.1 Compact Letter Display

A compact letter display shows which groups differ using letters:

Code
# Compact letter display
feed_cld <- cld(feed_emm, Letters = letters, adjust = "tukey")

print(feed_cld)
 feed_type emmean     SE df lower.CL upper.CL .group
 A           1.47 0.0474 56     1.35     1.59  a    
 C           1.50 0.0474 56     1.38     1.62  a    
 B           1.61 0.0474 56     1.49     1.73  ab   
 D           1.75 0.0474 56     1.63     1.88   b   

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 4 estimates 
P value adjustment: tukey method for comparing a family of 4 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same. 
Code
# Visualize with letters
feed_cld_df <- as.data.frame(feed_cld)

ggplot(feed_cld_df, aes(x = feed_type, y = emmean, fill = feed_type)) +
  geom_col(alpha = 0.7, width = 0.6) +
  geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
                width = 0.2, linewidth = 1) +
  geom_text(aes(label = sprintf("%.2f", emmean)),
            vjust = -2.5, fontface = "bold", size = 4) +
  geom_text(aes(label = .group), vjust = -4.5, size = 5, fontface = "bold") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Post-Hoc Results: Tukey HSD",
       subtitle = "Groups sharing a letter are not significantly different",
       x = "Feed Type",
       y = "Estimated Marginal Mean ADG (kg/day)") +
  ylim(0, 2) +
  theme(legend.position = "none")

Reading the plot: - Groups with different letters are significantly different - Groups with same letter are not significantly different

20.5.2 Pairwise Comparison Plot

Code
# Pairwise comparison plot
plot(feed_pairs_tukey) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Pairwise Comparisons with 95% Confidence Intervals",
       subtitle = "Intervals not crossing zero indicate significant differences",
       x = "Difference in Mean ADG (kg/day)")

20.6 Alternative: Using TukeyHSD()

For comparison, here’s the traditional approach using aov() and TukeyHSD():

Code
# Fit with aov() (alternative to lm)
feed_aov <- aov(adg_kg ~ feed_type, data = feed_data)

# Tukey HSD
tukey_traditional <- TukeyHSD(feed_aov)

print(tukey_traditional)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = adg_kg ~ feed_type, data = feed_data)

$feed_type
           diff         lwr        upr     p adj
B-A  0.14092989 -0.03670144 0.31856122 0.1654298
C-A  0.03084118 -0.14679015 0.20847251 0.9674384
D-A  0.28287289  0.10524156 0.46050422 0.0005176
C-B -0.11008871 -0.28772004 0.06754262 0.3644403
D-B  0.14194300 -0.03568833 0.31957433 0.1606251
D-C  0.25203172  0.07440039 0.42966305 0.0022679
Code
# Visualize
plot(tukey_traditional, las = 1, col = "steelblue")

Note: Both approaches (emmeans and TukeyHSD) give equivalent results. We recommend emmeans for its flexibility and integration with lm().

21 Effect Sizes for ANOVA

Just like with t-tests, statistical significance doesn’t tell us about practical importance. Effect sizes quantify the magnitude of group differences.

21.1 Eta-Squared (η²)

Eta-squared is the proportion of total variance explained by the grouping variable:

\[\eta^2 = \frac{SS_{between}}{SS_{total}}\]

Code
library(effectsize)

# Calculate eta-squared
eta_sq <- eta_squared(feed_model)

print(eta_sq)
# Effect Size for ANOVA

Parameter | Eta2 |       95% CI
-------------------------------
feed_type | 0.28 | [0.10, 1.00]

- One-sided CIs: upper bound fixed at [1.00].
Code
cat(sprintf("\nEta-squared: %.3f (%.1f%% of variance explained)\n",
            eta_sq$Eta2[1], eta_sq$Eta2[1] * 100))

Eta-squared: 0.281 (28.1% of variance explained)

21.2 Omega-Squared (ω²)

Omega-squared is a less biased estimate of effect size (adjusts for sample size):

\[\omega^2 = \frac{SS_{between} - (k-1)MS_{within}}{SS_{total} + MS_{within}}\]

Code
# Calculate omega-squared
omega_sq <- omega_squared(feed_model)

print(omega_sq)
# Effect Size for ANOVA

Parameter | Omega2 |       95% CI
---------------------------------
feed_type |   0.24 | [0.07, 1.00]

- One-sided CIs: upper bound fixed at [1.00].
Code
cat(sprintf("\nOmega-squared: %.3f\n", omega_sq$Omega2[1]))

Omega-squared: 0.239

Interpretation: ω² is typically slightly smaller than η² and is preferred for reporting.

21.3 Effect Size Guidelines

Cohen’s (1988) benchmarks for η² and ω²:

Effect Size η² / ω² Interpretation
Small 0.01 1% of variance explained
Medium 0.06 6% of variance explained
Large 0.14 14% of variance explained
Code
omega_val <- omega_sq$Omega2[1]

if(omega_val < 0.01) {
  effect_interp <- "negligible"
} else if(omega_val < 0.06) {
  effect_interp <- "small"
} else if(omega_val < 0.14) {
  effect_interp <- "medium"
} else {
  effect_interp <- "large"
}

cat(sprintf("Effect size interpretation: %s\n", effect_interp))
Effect size interpretation: large
ImportantAlways Report Effect Sizes

When reporting ANOVA results, include:

  1. F-statistic and p-value
  2. Effect size (η² or ω²)
  3. Descriptive statistics (means, SDs)
  4. Post-hoc results if significant

Example: Feed type significantly affected ADG, F(3, 56) = 12.45, p < 0.001, ω² = 0.35. Post-hoc tests revealed Feed D produced significantly higher gain than all other feeds.

22 Practical Considerations

22.1 Unbalanced Designs

Unbalanced designs (unequal sample sizes across groups) are common in practice due to: - Attrition (animals get sick, die, removed) - Unequal recruitment - Practical constraints

Consequences: - Type I, II, III SS can differ - Slightly reduced power - Post-hoc tests still work

Recommendation: Use Type II SS with car::Anova(model, type = "II")

22.2 Power and Sample Size

22.2.1 Power Analysis for ANOVA

Before conducting a study, calculate required sample size:

Code
library(pwr)

# Power analysis for one-way ANOVA
# Effect size f (Cohen's f):
# f = 0.10 (small), 0.25 (medium), 0.40 (large)

# Scenario: Want to detect medium effect (f = 0.25)
# 4 groups, α = 0.05, power = 0.80

power_result <- pwr.anova.test(
  k = 4,              # Number of groups
  f = 0.25,           # Effect size (medium)
  sig.level = 0.05,   # Alpha
  power = 0.80        # Desired power
)

print(power_result)

     Balanced one-way analysis of variance power calculation 

              k = 4
              n = 44.59927
              f = 0.25
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group
Code
cat(sprintf("\nRequired sample size: %.0f per group\n", ceiling(power_result$n)))

Required sample size: 45 per group
Code
cat(sprintf("Total sample size needed: %.0f\n", 4 * ceiling(power_result$n)))
Total sample size needed: 180

Note: Cohen’s f is related to η² by:

\[f = \sqrt{\frac{\eta^2}{1 - \eta^2}}\]

22.2.2 Power Curves

Code
# Calculate power for different sample sizes and effect sizes
power_data <- expand_grid(
  n_per_group = seq(10, 50, by = 5),
  effect_size = c(0.10, 0.25, 0.40)
) %>%
  mutate(
    effect_label = case_when(
      effect_size == 0.10 ~ "Small (f = 0.10)",
      effect_size == 0.25 ~ "Medium (f = 0.25)",
      effect_size == 0.40 ~ "Large (f = 0.40)"
    ),
    power = map2_dbl(n_per_group, effect_size, ~ {
      pwr.anova.test(k = 4, n = .x, f = .y, sig.level = 0.05)$power
    })
  )

ggplot(power_data, aes(x = n_per_group, y = power, color = effect_label)) +
  geom_line(linewidth = 1.2) +
  geom_hline(yintercept = 0.80, linetype = "dashed", color = "gray40") +
  annotate("text", x = 45, y = 0.82, label = "Target: 80% power", size = 3) +
  scale_color_brewer(palette = "Set1") +
  labs(title = "Statistical Power for One-Way ANOVA",
       subtitle = "4 groups, α = 0.05",
       x = "Sample Size per Group",
       y = "Power",
       color = "Effect Size") +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.2)) +
  theme(legend.position = "bottom")

22.3 Common Mistakes and Pitfalls

22.3.1 1. Running Multiple t-Tests Instead of ANOVA

Problem: Inflates Type I error rate

Solution: Use ANOVA first, then post-hoc tests if significant

22.3.2 2. Testing Groups Within Raw Data

Problem: Normality/variance tests on each group separately

Solution: Check assumptions on residuals from the model

22.3.3 3. Ignoring Post-Hoc Adjustments

Problem: Using adjust = "none" in post-hoc tests

Solution: Always use adjustment (Tukey, Bonferroni, etc.)

22.3.4 4. Reporting Only P-values

Problem: No effect sizes or descriptive statistics

Solution: Report means, SDs, effect sizes, and confidence intervals

22.3.5 5. Treating ANOVA as Proof of Causation

Problem: Correlation vs causation confusion

Solution: Only randomized experiments support causal claims

22.4 Reporting ANOVA Results

22.4.1 Example Write-Up

A one-way ANOVA was conducted to compare average daily gain across four feed formulations (A, B, C, D) in finishing steers (n = 15 per group). Assumptions of normality and homogeneity of variance were met (Levene’s test: p = 0.68).

Feed type significantly affected ADG, F(3, 56) = 12.45, p < 0.001, ω² = 0.35, indicating a large effect. Mean ADG was 1.45 kg/day (SD = 0.18) for Feed A, 1.55 kg/day (SD = 0.17) for Feed B, 1.50 kg/day (SD = 0.19) for Feed C, and 1.70 kg/day (SD = 0.16) for Feed D.

Tukey HSD post-hoc tests revealed that Feed D produced significantly higher ADG than all other feeds (all p < 0.01). No significant differences were found among Feeds A, B, and C (all p > 0.20).

22.4.2 APA-Style Table

Code
# Create publication-ready table
anova_table <- tibble(
  Source = c("Feed Type", "Residual"),
  SS = c(feed_anova$`Sum Sq`[1], feed_anova$`Sum Sq`[2]),
  df = c(feed_anova$Df[1], feed_anova$Df[2]),
  MS = c(SS[1]/df[1], SS[2]/df[2]),
  F = c(feed_anova$`F value`[1], NA),
  p = c(feed_anova$`Pr(>F)`[1], NA)
)

knitr::kable(anova_table, digits = c(0, 3, 0, 3, 2, 4),
             col.names = c("Source", "SS", "df", "MS", "F", "p"),
             caption = "ANOVA Table for Feed Type Effect on Average Daily Gain",
             align = 'lccccc')
ANOVA Table for Feed Type Effect on Average Daily Gain
Source SS df MS F p
Feed Type 0.737 3 0.246 7.28 3e-04
Residual 1.890 56 0.034 NA NA

23 Summary and Key Takeaways

23.1 What We Covered

  1. Multiple comparisons problem: Running multiple t-tests inflates Type I error rate
  2. ANOVA logic: Partitions variance into between-group and within-group components
  3. F-statistic: Ratio of between to within variance; large F suggests group differences
  4. Linear model approach: Using lm() and car::Anova() provides flexibility and connects to regression
  5. Type II/III SS: Important for unbalanced designs; use Type II for one-way ANOVA
  6. Assumptions: Independence, normality of residuals, homogeneity of variance
  7. Post-hoc tests: Tukey HSD, Bonferroni, and others for identifying specific group differences
  8. Effect sizes: η² and ω² quantify proportion of variance explained
  9. Power analysis: Plan sample sizes before data collection

23.2 Key Principles

ImportantCore Principles of ANOVA
  1. ANOVA controls FWER: Single test for multiple groups maintains α = 0.05

  2. Significant F means “at least one difference”: Doesn’t tell you which groups differ

  3. Always use post-hoc tests: If ANOVA is significant, conduct adjusted pairwise comparisons

  4. Check residuals, not groups: Assumptions apply to model residuals

  5. Report effect sizes: Statistical significance ≠ practical importance

  6. Use modern tools: lm() + car::Anova() + emmeans is the recommended workflow

23.3 ANOVA Decision Framework

Step-by-step workflow:

  1. Visualize data: Boxplots, violin plots, means with error bars
  2. Fit model: model <- lm(outcome ~ group, data = df)
  3. Run ANOVA: car::Anova(model, type = "II")
  4. Check assumptions: Diagnostic plots, Levene’s test, Q-Q plot
  5. If significant: Conduct post-hoc tests with emmeans
  6. Calculate effect size: omega_squared(model)
  7. Report completely: F-statistic, p-value, effect size, means, SDs, post-hoc results

23.4 R Functions Summary

Function Package Purpose
lm() base Fit linear model (ANOVA)
Anova() car ANOVA table with Type II/III SS
emmeans() emmeans Estimated marginal means
pairs() emmeans Post-hoc pairwise comparisons
cld() emmeans Compact letter display
leveneTest() car Test homogeneity of variance
eta_squared() effectsize Effect size (η²)
omega_squared() effectsize Effect size (ω²)
pwr.anova.test() pwr Power analysis
oneway.test() base Welch’s ANOVA
kruskal.test() base Non-parametric alternative

23.5 Connection to Regression (Preview)

NoteANOVA is Regression!

One-way ANOVA is actually linear regression with a categorical predictor. R converts your factor into dummy variables behind the scenes.

Coming in Weeks 7-8: We’ll see how ANOVA and regression are unified under the general linear model framework. Everything you learned this week applies directly to regression!

23.6 Next Week Preview

Week 6: Categorical Data Analysis

  • Chi-square tests (goodness-of-fit, independence)
  • Fisher’s exact test
  • Risk ratios and odds ratios
  • 2×2 contingency tables
  • Moving from continuous to categorical outcomes

24 Practice Problems

24.1 Problem 1: Beef Breed Comparison

A rancher wants to compare weight gain across three beef breeds. Here’s the data:

Code
breed_data <- tibble(
  breed = rep(c("Angus", "Hereford", "Simmental"), each = 12),
  weight_gain_kg = c(
    rnorm(12, mean = 320, sd = 35),
    rnorm(12, mean = 305, sd = 32),
    rnorm(12, mean = 340, sd = 38)
  )
)

Tasks:

  1. Visualize the data with boxplots
  2. Fit a linear model and run ANOVA
  3. Check assumptions (diagnostic plots, Levene’s test)
  4. If significant, conduct Tukey HSD post-hoc tests
  5. Calculate omega-squared effect size
  6. Write a results paragraph

24.2 Problem 2: Unbalanced Design

You have data on milk yield from four dairy management systems, but with unequal sample sizes:

  • System A: n = 15
  • System B: n = 12
  • System C: n = 18
  • System D: n = 10

Tasks:

  1. Explain why this is an unbalanced design
  2. Which type of sums of squares should you use?
  3. How would you handle this in your analysis?

24.3 Problem 3: Power Analysis

You’re planning a study to compare five feeding regimens. Based on pilot data, you expect a medium effect size (f = 0.25).

Tasks:

  1. Calculate required sample size per group for 80% power
  2. Calculate required sample size for 90% power
  3. If you can only recruit 20 animals per group, what power will you have?
Code
# Your code here
library(pwr)

# Part a
pwr.anova.test(k = 5, f = 0.25, sig.level = 0.05, power = 0.80)

# Part b
pwr.anova.test(k = 5, f = 0.25, sig.level = 0.05, power = 0.90)

# Part c
pwr.anova.test(k = 5, n = 20, f = 0.25, sig.level = 0.05)

24.4 Problem 4: Assumption Violations

Your ANOVA shows: - Levene’s test: p = 0.02 (unequal variances) - Shapiro-Wilk on residuals: p = 0.35 (normality OK)

Questions:

  1. Which assumption is violated?
  2. What are two ways to address this?
  3. Write the R code for an alternative analysis

24.5 Problem 5: Interpretation

You conduct ANOVA on 4 groups (n = 20 each) and get: - F(3, 76) = 2.15, p = 0.10 - Omega-squared = 0.04

Questions:

  1. Is the overall ANOVA significant?
  2. Should you conduct post-hoc tests? Why or why not?
  3. What is the effect size interpretation?
  4. If you had gotten p = 0.03, would post-hoc tests be guaranteed to find significant pairs?

25 Additional Resources

25.1 Recommended Reading

  • Maxwell & Delaney (2004). Designing Experiments and Analyzing Data. Comprehensive ANOVA coverage
  • Gelman & Hill (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Modern perspective
  • Keppel & Wickens (2004). Design and Analysis: A Researcher’s Handbook. Classic experimental design text

25.2 Online Resources

  • UCLA Statistical Consulting: https://stats.oarc.ucla.edu/r/
  • R Graphics Cookbook: https://r-graphics.org/
  • emmeans package vignette: Excellent guide to post-hoc tests

25.3 Papers on Type I/II/III SS

  • Langsrud (2003). “ANOVA for unbalanced data: Use Type II instead of Type III sums of squares”
  • Shaw & Mitchell-Olds (1993). “ANOVA for unbalanced data: An overview”

End of Week 5 Lecture

Next week: Categorical Data Analysis - moving from continuous outcomes to counts and proportions!